This paper presents the Crowd Score, a novel method to assess the funniness of jokes using large language models (LLMs) as AI judges. Our method relies on inducing different personalities into the LLM and aggregating the votes of the AI judges into a single score to rate jokes. We validate the votes using an auditing technique that checks if the explanation for a particular vote is reasonable using the LLM. We tested our methodology on 52 jokes in a crowd of four AI voters with different humour types: affiliative, self-enhancing, aggressive and self-defeating. Our results show that few-shot prompting leads to better results than zero-shot for the voting question. Personality induction showed that aggressive and self-defeating voters are significantly more inclined to find more jokes funny of a set of aggressive/self-defeating jokes than the affiliative and self-enhancing voters. The Crowd Score follows the same trend as human judges by assigning higher scores to jokes that are also considered funnier by human judges. We believe that our methodology could be applied to other creative domains such as story, poetry, slogans, etc. It could both help the adoption of a flexible and accurate standard approach to compare different work in the CC community under a common metric and by minimizing human participation in assessing creative artefacts, it could accelerate the prototyping of creative artefacts and reduce the cost of hiring human participants to rate creative artefacts.
translated by 谷歌翻译
符号回归是一种非线性回归方法,通常通过诸如遗传编程等进化计算方法执行。量化回归模型的不确定性对于模型和决策的解释很重要。线性近似和所谓的似然谱是非线性回归模型计算置信度和预测间隔的众所周知的可能性。到目前为止,这些简单有效的技术在遗传编程文献中已被完全忽略。在这项工作中,我们在详细信息中描述了似然概况的计算,还提供了一些说明性示例,其中使用了两个不同数据集上使用三种不同的符号回归算法创建的模型。这些示例突出了可能性概况的重要性,即了解符号回归模型的局限性,并帮助用户做出明智的预测后决策。
translated by 谷歌翻译
巴西最高法院每学期收到数万案件。法院员工花费数千个小时来执行这些案件的初步分析和分类 - 这需要努力从案件管理工作流的后部,更复杂的阶段进行努力。在本文中,我们探讨了来自巴西最高法院的文件多模式分类。我们在6,510起诉讼(339,478页)的新型多模式数据集上训练和评估我们的方法,并用手动注释将每个页面分配给六个类之一。每个诉讼都是页面的有序序列,它们既可以作为图像存储,又是通过光学特征识别提取的相应文本。我们首先训练两个单峰分类器:图像上对Imagenet进行了预先训练的重新编织,并且图像上进行了微调,并且具有多个内核尺寸过滤器的卷积网络在文档文本上从SCRATCH进行了训练。我们将它们用作视觉和文本特征的提取器,然后通过我们提出的融合模块组合。我们的融合模块可以通过使用学习的嵌入来处理缺失的文本或视觉输入,以获取缺少数据。此外,我们尝试使用双向长期记忆(BILSTM)网络和线性链条件随机字段进行实验,以模拟页面的顺序性质。多模式方法的表现都优于文本分类器和视觉分类器,尤其是在利用页面的顺序性质时。
translated by 谷歌翻译
深度集群(DC)利用深度架构的表示力来学习嵌入空间,这些空格是最佳的集群分析。此方法会滤除对聚类无关的低级信息,并已证明对于高维数据空间非常成功。一些DC方法采用生成的对抗网络(GANS),受到强大的潜在表示,这些模型能够隐含地学习。在这项工作中,我们提出了一种基于带有多个发电机(MGANS)的GAN的新技术,尚未探讨聚类。我们的方法受到观察到Mangan的每个生成器倾向于生成与实际数据分布的子区域相关的数据。我们使用此集群生成来训练分类器,以推断给定图像来自哪个生成器,从而为实际分配提供了语义有意义的聚类。此外,我们设计了我们的方法,使其在自上而下的分层聚类树中执行,从而提出了我们最佳知识的第一层级DC方法。我们进行若干实验来评估近期直流方法的提出方法,获得竞争力。最后,我们对分层聚类树进行了探索性分析,突出显示了它在语义相干模式的层次结构中组织的准确性。
translated by 谷歌翻译
Covid -19在首次检测只有四个月后迅速成为全球性大流行。尽快检测这种疾病至关重要的是降低其蔓延。胸部X射线(CXR)图像的使用变成了有效的筛选策略,互补逆转录聚合酶链反应(RT-PCR)。卷积神经网络(CNNS)通常用于自动图像分类,它们在CXR诊断中非常有用。在本文中,测试了21种不同的CNN架构,并在COVID-19中识别CXR图像的任务进行比较。它们应用于CoVIDX8B数据集,这是可用的最大和更多样化的Covid-19数据集。还采用了CNN的合奏,并且它们表现出比个体实例更好的效率。 Densenet169实现了最佳的个人CNN实例结果,精度为98.15%,F1分数为98.12%。通过与DenSenet169的五个实例的合并,这些进一步增加到99.25%和99.24%。这些结果高于使用相同数据集的最近作品中获得的结果。
translated by 谷歌翻译
在2015年和2019年之间,地平线的成员2020年资助的创新培训网络名为“Amva4newphysics”,研究了高能量物理问题的先进多变量分析方法和统计学习工具的定制和应用,并开发了完全新的。其中许多方法已成功地用于提高Cern大型Hadron撞机的地图集和CMS实验所执行的数据分析的敏感性;其他几个人,仍然在测试阶段,承诺进一步提高基本物理参数测量的精确度以及新现象的搜索范围。在本文中,在研究和开发的那些中,最相关的新工具以及对其性能的评估。
translated by 谷歌翻译
In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on discounted MDPs, robust average-reward MDPs remain largely unexplored. In this paper, we focus on robust average-reward MDPs, where the goal is to find a policy that optimizes the worst-case average reward over an uncertainty set. We first take an approach that approximates average-reward MDPs using discounted MDPs. We prove that the robust discounted value function converges to the robust average-reward as the discount factor $\gamma$ goes to $1$, and moreover, when $\gamma$ is large, any optimal policy of the robust discounted MDP is also an optimal policy of the robust average-reward. We further design a robust dynamic programming approach, and theoretically characterize its convergence to the optimum. Then, we investigate robust average-reward MDPs directly without using discounted MDPs as an intermediate step. We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value iteration algorithm that provably finds its solution, or equivalently, the optimal robust policy.
translated by 谷歌翻译
Over the past decade, neural networks have been successful at making predictions from biological sequences, especially in the context of regulatory genomics. As in other fields of deep learning, tools have been devised to extract features such as sequence motifs that can explain the predictions made by a trained network. Here we intend to go beyond explainable machine learning and introduce SEISM, a selective inference procedure to test the association between these extracted features and the predicted phenotype. In particular, we discuss how training a one-layer convolutional network is formally equivalent to selecting motifs maximizing some association score. We adapt existing sampling-based selective inference procedures by quantizing this selection over an infinite set to a large but finite grid. Finally, we show that sampling under a specific choice of parameters is sufficient to characterize the composite null hypothesis typically used for selective inference-a result that goes well beyond our particular framework. We illustrate the behavior of our method in terms of calibration, power and speed and discuss its power/speed trade-off with a simpler data-split strategy. SEISM paves the way to an easier analysis of neural networks used in regulatory genomics, and to more powerful methods for genome wide association studies (GWAS).
translated by 谷歌翻译
We consider the problem of automatically generating stories in multiple languages. Compared to prior work in monolingual story generation, crosslingual story generation allows for more universal research on story planning. We propose to use Prompting Large Language Models with Plans to study which plan is optimal for story generation. We consider 4 types of plans and systematically analyse how the outputs differ for different planning strategies. The study demonstrates that formulating the plans as question-answer pairs leads to more coherent generated stories while the plan gives more control to the story creators.
translated by 谷歌翻译
Crop type maps are critical for tracking agricultural land use and estimating crop production. Remote sensing has proven an efficient and reliable tool for creating these maps in regions with abundant ground labels for model training, yet these labels remain difficult to obtain in many regions and years. NASA's Global Ecosystem Dynamics Investigation (GEDI) spaceborne lidar instrument, originally designed for forest monitoring, has shown promise for distinguishing tall and short crops. In the current study, we leverage GEDI to develop wall-to-wall maps of short vs tall crops on a global scale at 10 m resolution for 2019-2021. Specifically, we show that (1) GEDI returns can reliably be classified into tall and short crops after removing shots with extreme view angles or topographic slope, (2) the frequency of tall crops over time can be used to identify months when tall crops are at their peak height, and (3) GEDI shots in these months can then be used to train random forest models that use Sentinel-2 time series to accurately predict short vs. tall crops. Independent reference data from around the world are then used to evaluate these GEDI-S2 maps. We find that GEDI-S2 performed nearly as well as models trained on thousands of local reference training points, with accuracies of at least 87% and often above 90% throughout the Americas, Europe, and East Asia. Systematic underestimation of tall crop area was observed in regions where crops frequently exhibit low biomass, namely Africa and South Asia, and further work is needed in these systems. Although the GEDI-S2 approach only differentiates tall from short crops, in many landscapes this distinction goes a long way toward mapping the main individual crop types. The combination of GEDI and Sentinel-2 thus presents a very promising path towards global crop mapping with minimal reliance on ground data.
translated by 谷歌翻译